A tool to build a treebank for conversational Chinese

نویسندگان

  • Yves Lepage
  • Nicolas Auclerc
  • Satoshi Shirai
چکیده

N-grams have been extensively used with phonemes or words as basic units in speech recognition. Recently, it has been proposed to use n-grams with phrase tree structures as units to increase speech recognition quality. In order to test this idea on Chinese, a treebank of Chinese hotel reservation conversation utterances is needed. Because no such treebank is yet available, we have to build it. We propose to see the process of building a tree-bank as a sequence of edition and search operations:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Build a Chinese Treebank as the test suite for Chinese parser

This paper will introduce our current work to build a Chinese treebank that can be used as a test suite for Chinese parser. The treebank will consist of 10,000 Chinese sentences extracted from a Chinese balanced corpus with about 2,000,000 Chinese characters. The corpus has already been annotated with correct segmentation and Part-Of-Speech(POS) information. The following issues will be discuss...

متن کامل

Iterative Transformation of Annotation Guidelines for Constituency Parsing

This paper presents an effective algorithm of annotation adaptation for constituency treebanks, which transforms a treebank from one annotation guideline to another with an iterative optimization procedure, thus to build a much larger treebank to train an enhanced parser without increasing model complexity. Experiments show that the transformed Tsinghua Chinese Treebank as additional training d...

متن کامل

Build a Situation-based Language Knowledge Base

Language resources are very important for natural language processing research and applications. This paper will introduce our ongoing research work to build a situation-based language knowledge base for the Chinese language, based on two basic language resources: three Chinese semantic lexicons and a large scale Chinese treebank. We developed a supporting platform to make full use of the abund...

متن کامل

Build a Large-Scale Syntactically Annotated Chinese Corpus

This paper reports on our research to build a large-scale Tsinghua Chinese Treebank (TCT). We propose a two-stage approach to reduce manual proofreading labors as much as possible. The insertion of an intermediate functional chunk level creates a good information bridge to link simple chunk annotation with detailed syntactic tree annotation. We describe our chunk and tree annotation schemes, fo...

متن کامل

Extending and Scaling up the Chinese Treebank Annotation

We discuss on-going efforts to scale up the Chinese Treebank annotation and extending Chinese treebanking to informal genres like conversational speech, news groups and weblogs, as well as discussion forums. The original Chinese Treebank annotation scheme was designed for formal genres such as newswire and magazine articles, where the language is very formal and each document is carefully edite...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000